Search CORE

8 research outputs found

Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods

Author: A. S. Hazaa Muneer
Albared Mohammed
Ba-Alwi Fadl Mutaher
Omar Nazlia
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2016
Field of study

Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay <N+N> compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and rank combination method) are the best association measure and outperforms the other association measures for <N+N> collocations in the Malay corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score

IAES journal

Crossref

Institute of Advanced Engineering and Science

A Survey on Unsupervised K-Means Algorithm in Big Data Environment

Author: Al-deen Fatama Sharf
Ba-Alwi Fadl Mutaher
Publication venue: SCIENCEDOMAIN international
Publication date: 24/08/2021
Field of study

Due to the rapid development in information technology, Big Data has become one of its prominent feature that had a great impact on other technologies dealing with data such as machine learning technologies. K-mean is one of the most important machine learning algorithms. The algorithm was first developed as a clustering technology dealing with relational databases. However, the advent of Big Data has highly effected its performance. Therefore, many researchers have proposed several approaches to improve K-mean accuracy in Big Data environment. In this paper, we introduce a literature review about different technologies proposed for k-mean algorithm development in Big Data. We demonstrate a comparison between them according to several criteria, including the proposed algorithm, the database used, Big Data tools, and k-mean applications. This paper helps researchers to see the most important challenges and trends of the k-mean algorithm in the Big Data environment

Asian Journal of Research in Computer Science

Intrusion detection model using machine learning algorithm on Big Data environment

Author: Amal Y. Al-Hashida
Fadl Mutaher Ba-Alwi
Nabeel T. Alsohybe
Suad Mohammed Othman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2018
Field of study

Abstract Recently, the huge amounts of data and its incremental increase have changed the importance of information security and data analysis systems for Big Data. Intrusion detection system (IDS) is a system that monitors and analyzes data to detect any intrusion in the system or network. High volume, variety and high speed of data generated in the network have made the data analysis process to detect attacks by traditional techniques very difficult. Big Data techniques are used in IDS to deal with Big Data for accurate and efficient data analysis process. This paper introduced Spark-Chi-SVM model for intrusion detection. In this model, we have used ChiSqSelector for feature selection, and built an intrusion detection model by using support vector machine (SVM) classifier on Apache Spark Big Data platform. We used KDD99 to train and test the model. In the experiment, we introduced a comparison between Chi-SVM classifier and Chi-Logistic Regression classifier. The results of the experiment showed that Spark-Chi-SVM model has high performance, reduces the training time and is efficient for Big Data

Directory of Open Access Journals

Optimization Iteration Min-Max Cross Over Genetic Algorithm To Generate Fuzzy membership Function Automatically

Crossref